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We investigate Monte Carlo energy and variance minimization techniques for optimizing many-body wave func- 
tions. Several variants of the basic techniques are studied, including limiting the variations in the weighting factors 
which arise in correlated sampling estimations of the energy and its variance. We investigate the numerical stabil- 
ity of the techniques and identify two reasons why variance minimization exhibits superior numerical stability to 
energy minimization. The characteristics of each method are studied using a non-interacting 64-electron model of 
crystalline silicon. While our main interest is in solid state systems, the issues investigated are relevant to Monte 
Carlo studies of atoms, molecules and solids. We identify a robust and efficient variance minimization scheme for 
optimizing wave functions for large systems. 



I. INTRODUCTION 

Accurate approximations to many-body wave func- 
tions are crucial for the success of quantum Monte Carlo 
(QMC) calculations. In the variational quantum Monte 
Carlo (VMC) method 0,0| expectation values are cal- 
culated as integrals over configuration space, which are 
evaluated using standard Monte Carlo techniques. In 
the more sophisticated diffusion Monte Carlo (DMC) 
method y,|^ imaginary time evolution of the Schrodinger 
equation is used to calculate very accurate expectation 
values. Importance sampling is included via a trial wave 
function and the fermion sign-problem is evaded by using 
the fixed-node approximation. 

The most costly part of VMC and DMC calculations 
is normally the evaluation of the trial wave function (and 
its gradient and Laplacian) at many different points in 
configuration space. The accuracy of the trial wave func- 
tion controls the statistical efficiency of the algorithm 
and limits the final accuracy that can be obtained. It is 
therefore necessary to use trial wave functions which are 
as accurate as possible yet can be computed rapidly. By 
far the most common type of trial wave function used 
in VMC and DMC calculations for atoms, molecules and 
solids is the Slater-Jastrow form: 
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where N is the number of electrons, D\ and D^^ are Slater 
determinants of spin-up and spin-down single-particle 
orbitals, the /3„ are coefficients, x is a one-body function, 
and u is a relative-spin-dependent two-body correlation 
factor. 

The functions u and x normally contain variable pa- 
rameters, and one may also wish to vary the /3„ and pa- 
rameters in the single-particle orbitals forming the Slater 



determinants. The values of the parameters are obtained 
via an optimization procedure. Typical solid state prob- 
lems currently involve optimizing of order 10^ parameters 
for lO'^-dimensional functions. These optimization prob- 
lems are delicate and require careful handling. 

In this paper we investigate several variants of energy 
and variance minimization techniques. Our aims arc (i) 
to identify the reasons why variance minimization ex- 
hibits superior numerical stability to energy minimiza- 
tion, and (ii) to identify the best variance minimization 
scheme for optimizing wave functions in large systems. 
We concentrate on two areas, the nature of the objective 
function (Section ||) and the effects of approximating 
the required integrals by finite sums (Section pl[ ). In 
Section |^ we use a 64-electron model of crystalline sili- 
con to investigate the behaviour of various optimization 
schemes, while in Section M we draw our conclusions. 



II. THE OBJECTIVE FUNCTION 

In order to optimize a wave function we require an ob- 
jective function, i.e., a quantity which is to be minimized 
with respect to a set of parameters, {a}. The criteria 
that a successful objective function should satisfy for use 
in a Monte Carlo optimization procedure arc that (i) the 
global minimum of the objective function should corre- 
spond to a high quality wave function, (ii) the variance 
of the objective function should be as small as possible, 
and (iii) the minimum in the objective function should 
be as sharp and deep as possible. One natural objective 
function is the expectation value of the energy. 



E^J = 



/$2(q,) [$-i(a)7J$(a)]dR 

/$2(Q,)rfR 



(2) 



where the integrals are over the SA'^-dimensional con- 
figuration space. The numerator is the integral over 
the probability distribution, $^(a), of the local energy, 



In fact the energy is not the preferred objective func- 
tion for wave function optimization, and the general con- 
sensus is that a better procedure is to minimize the vari- 
ance of the energy, which is given by 



A{a) 






(3) 



Optimizing wave functions by minimizing the variance 
of the energy is actually a very old idea, having being 
used in the 1930's. The first application using Monte 
Carlo techniques to evaluate the integrals appears to have 
been by Conroy B, but the present popularity of the 
method derives from the developments of Umrigar and 
coworkers pM . A number of reasons have been advanced 
for preferring variance minimization, including: (i) it has 
a known lower bound of zero, (ii) the resulting wave func- 
tions give good estimates for a range of properties, not 
just the energy, (iii) it can be applied to excited states, 
(iv) efficient algorithms are known for minimizing objec- 
tive functions which can be written as a sum of squares, 
and (v) it exhibits greater numerical stability than en- 
ergy minimization. The latter point is very significant 
for applications to large systems. 

The minimum possible value of A{a) is zero. This 
value is obtained if and only if $(a) is an exact eigen- 
state of H. Minimization of A{a) has normally been 
carried out via a correlated sampling approach in which 
a set of configurations distributed according to <i>2(ao) is 
generated, where ao is an initial set of parameter values. 
A{a) is then evaluated as 



A{a) 



J $^(ao) w{a) [Ei^ja) ~ £^v(a)]^ dR 



(4) 



where the integrals contain a weighting factor, w{a), 
given by 



w{a) 






(5) 



A{a) is then minimized with respect to the parameters 
{a}. The set of configurations is normally regenerated 
several times with the updated parameter values so that 
when convergence is obtained {ao}={a}. A variant of 
Eq. U is obtained by replacing the energy Ev{a) by a 
fixed value, E, giving 



Bia) = 



J $2(q,o) w{a) [Ei^ia) - Ef dH 
/$2(ao) w(a)dR 



(6) 



Note that \i E < Eq, where Eq is the exact ground state 
energy, then the minimum possible value of B(a) occurs 
when $=$0, the exact ground state wave function. Min- 
imization of B{a) is equivalent to minimizing a linear 
combination of Ey and A{a). The absolute minima of 
both Ey and A{a) occur when $=$0. If both of the co- 
efficients of Ey and A[a) in the linear combination are 



positive, which is guaranteed ii E < Eq, then it follows 
that the absolute minimum of i?(Q:) occurs at $=<I>o- Us- 
ing this method with E < Eq allows optimization only 
of the ground state wave function. 

Although minimization of A{a) or B{a) using cor- 
related sampling methods has often been successful, in 
some cases the procedure can exhibit a numerical in- 
stability. Two situations where this is likely to occur 
have been identified. The first is when the nodes of the 
trial wave function are allowed to alter during the op- 
timization process. A similar instability can arise when 
the number of electrons in the systems becomes large, 
which can result in an instability even if the nodes of 
the trial wave function remain fixed. The characteristic 
of these numerical instabilities is that during the mini- 
mization procedure a few configurations (often only one) 
acquire a very large weight. The estimate of the variance 
is then reduced almost to zero by a set of parameters 
which are found to give extremely poor results in a sub- 
sequent QMC calculation. When the nodes of the trial 
wave function are altered large weights are most likely to 
occur for configurations close to the zeros of the proba- 
bility distribution $^(010). Large weights can also occur 
when varying the Jastrow factor if the number of elec- 
trons, N , is large. For a small change in the one-body 
function, Sx, the local energy changes by an amount pro- 
portional to N6x, but the weight is multiplied by a fac- 
tor which is exponential in Ndx, which can result in very 
large or very small weights if N is large. A similar ar- 
gument holds for changes in the two-body term, which 
shows an even more severe potential instability because 
the change in the two-body term scales like N'^. 

The instability due to the weights has been noticed by 
many researchers. In principle one could overcome this 
instability by using more configurations, but the number 
required is normally impossibly large. Various practical 
ways of dealing with this instability have been devised. 
One method is to limit the upper value of the weights M 
or to set the weights equal to unity |§|,||. Schmidt and 
Moskowitz [H set the weights equal to unity in calcula- 
tions for small systems in which the nodes were altered. 
An alternative approach is to draw the configurations 
from a modified probability distribution which is positive 
definite, so that the weights do not get very large. 11^] 
In our calculations for large systems of up to 1000 elec- 
trons 111) we also set the weights equal to unity while 
optimizing the Jastrow factor. When using the corre- 
lated sampling approach, whether or not the weights are 
modified, better results are obtained by periodically re- 
generating a new set of configurations chosen from the 
distribution $2 (a), where {a} is the updated parame- 
ter set. This helps the convergence of the minimization 
procedure. One can also restrict the allowed variation 
in the parameters {a} before regenerating a new set of 
configurations, but this can slow the convergence. We 
found Q that setting the weights to unity allowed us to 
alter the parameters by a larger amount before we had to 
regenerate the configurations with the new set of param- 



eters. After a few (typically three or four) regenerations 
we found that the parameters had converged to stable 
values giving a small variance and low energy in a subse- 
quent VMC calculation. 

These strategies can often overcome the numerical in- 
stability. Our goal is to apply QMC methods to large 
systems with many inequivalent atoms, which will re- 
quire wave functions for many electrons with many vari- 
able parameters. We would like to be able to optimize 
the determinantal part of the wave function as well as the 
Jastrow factor, which has only recently been attempted 
for solids ||l^, and we would also like to optimize ex- 
cited states as well as ground states. In order to accom- 
plish these goals we will need to improve our optimization 
techniques. In this paper we analyse energy and vari- 
ance minimization techniques, in the expectation that a 
deeper understanding of the issues of numerical stability 
will lead to improved algorithms. 

First we analyse the procedure of setting the weights to 
unity, which gives a new objective function, C{a), where 



C'ia) 



and 



/$2(ao)dR 



/$2(ao) [$-i(a)H$(a)]dR 

/$2(Q,o)dR 



(7) 



(8) 



is the unweighted energy. The objective function C{a) 
has the property that its absolute minimum is zero and 
that this value is obtained if and only if $(«) is an 
exact eigenstate of H, because for an exact eigenstate 
E'l = Ec- The absolute minima of C{a) are therefore at 
the same positions as those of A{a) and therefore C{a) 
should be a satisfactory objective function. As we will 
show by explicit example in Section IV, the advantage of 



C{a) is that it has a lower variance than A{a) , especially 
when ao and a differ significantly. A similar analysis can 
be applied to the case where the weights are subject to 
an upper limit, and we will refer to all such expressions 
with modified weights as variants of C and Ec- 

The objective function C{a) contains the unweighted 
ener gy Eg. As we will show by explicit example in Sec- 
tion [V, the ground state of H does not necessarily cor- 
respond to the minimum value of Ec- The energy Ec 
is therefore not a satisfactory objective function in its 
own right. If we replace the energy Ec{a) in Eq. ^ by 
some other energy E, then the minima of the objective 
function occur at the eigenstates of H if and only if E 
evaluated with the exact wave function is equal to the 
exact energy of the eigenstate. This requirement still al- 
lows freedom in the choice of E, and the following form 
is sufficient. 



£; = 



Jp{R.)Ei^{a)dR 
Jp{R)dR 



(9) 



and the variance C independently, without shifting the 
positions of the absolute minima of C. In this work we 
have not investigated this freedom and we have always 
used the same weights for Ec and C. 

The above analysis applies for wave functions with suf- 
ficient variational freedom to encompass the exact wave 
function. In practical situations we are unable to find 
exact wave functions and it is important to consider the 
effect this has on the optimization process. Although the 
objective functions A{a) and C{a) are unbiased in the 
sense that the exact ground state wave function corre- 
sponds to an absolute minimum, C(a) is biased in the 
sense that for a wave function which cannot be exact the 
optimized parameters will not exactly minimize the true 
variance. We refer to this as a "weak bias" because it 
disappears as the wave function tends to the exact one. 
In practice this is not a problem because in minimizing 
C{a) we regenerate the configurations several times with 
the updated distribution until convergence is obtained, 
so that minimization of A{a) and C(a) turns out to give 
almost identical parameter values. On the other hand, 
the unweighted energy, Ec, shows a "strong bias" in the 
sense that the nature of its stationary points are very dif- 
ferent from those of the properly weighted energy. The 
ability to alter the weights while not affecting the posi- 
tions of the minima is an important advantage of variance 
minimization over energy minimization, which we believe 
is one of the factors which leads to the greater numerical 
stability of variance minimization. 



III. FURTHER EFFECTS OF FINITE SAMPLING 

In the previous section we described the numerical in- 
stability arising from the weighting factors. The origin 
of this problem lies in approximating the integrals by the 
average of the integrand over a finite set of points in con- 
figuration space. There is another important issue con- 
nected with the approximation of finite sampling, which 
is whether the positions of the minima of the objective 
function are altered by the finite sampling itself. 

Consider the objective function A{a), in the case where 
the trial wave function has sufficient variational freedom 
to encompass the exact wave function. Approximating 
Eq. ^ by an average over the set {Ri} containing Ng 
configurations drawn from the distribution ^'^(ao) gives 



^N 



A 



Ns 



Er° w{R,;a)[Ei^{Rf,a) - Ey{{R,}; a)]' 



(10) 



where p(R) is any probability distribution. This demon- 
strates that we can alter the weights in the energy Ec 



The eigenstates of H give A^= = for any size of sample 
because Ei^ — Ey for an eigenstate. Clearly this result 
also holds for C(a). This behaviour contrasts with that 
of the variational energy, Ey- Consider a finite sampling 
of the variational energy of Eq. 0, where the configura- 



tions are distributed according to ^^(ao) and properly 
weighted, 



^N 



hy - 



j:'^^w{R,;a)Ei,{Rf,a) 



^N 



J2t " w{'R^■,a) 



(11) 



pN, 



The global minima of Ey" are not guaranteed to lie at 
the eigenstates of H for a finite sample. The fact that 
the positions of the global minima of A{a) and C(a) are 
robust to finite sampling is a second important advantage 
of variance minimization over energy minimization. 



IV. TESTS OF MINIMIZATION PROCEDURES 

We now investigate the performance of the various en- 
ergy and variance minimization techniques for a solid 
state system. We would like to know the exact wave func- 
tion for our test system, and therefore we have chosen a 
non-interacting system. We model the valence states of 
silicon in the diamond structure, using periodic bound- 
ary conditions to simulate the solid. The fee simulation 
cell contains 16 atoms and 64-electrons. The electrons 
are subject to a local potential which is described by 
two Fourier components, Vm — -0.1 a.u., and V220 — 
-0.06 a.u., chosen to give a reasonable description of the 
valence bandstructure of silicon. The value of Vm is in 
good agreement with empirical pseudopotential form fac- 
tors for silicon |l^], while the value of V220 is somewhat 
larger. Overall this model gives a reasonable description 
of the valence states of silicon and retains the essential 
features for testing the optimization techniques. 

The "exact" single-particle orbitals were obtained by 
diagonalizing the Hamiltonian in a plane-wave basis set 
containing all waves up to an energy cutoff of 15 a.u. 
This basis set is still incomplete, but the square root of 
the variance of the energy is about 0.02 eV per atom, 
which is negligible for our purposes. We have added a 
variational parameter, a, in the form of a % function 
with the full symmetry of the diamond structure: 



Kr)=af^PGe'°"-j 



(12) 



where G labels the 8 reciprocal lattice vectors of the [111] 
star and Pg is a phase factor associated with the non- 
symmorphic symmetry operations. The exact value of 
the parameter, a, is, of course, zero. To model the situ- 
ation where the wave function does not possess the vari- 
ational freedom to encompass the exact one we used a 
smaller basis set cutoff of 2.5 a.u. The variational energy 
from this wave function is 0.35 eV per atom above the 
exact value, which is typical of the values we encounter 
in our solid state calculations. The optimal value of a 
for this inexact wave function is very close to zero. 



This model exhibits all the numerical problems we have 
encountered in optimization procedures. In practical sit- 
uations one may have more electrons and more parame- 
ters to optimize, which makes the numerical instabilities 
more pronounced. In order to analyse the behaviour in 
detail we have evaluated the variance of the objective 
functions. We found that unfeasibly large numbers of 
electron configurations were required to obtain accurate 
values of the variance of the objective functions for wave 
functions with many more electrons and variables param- 
eters than used in our model system. We stress that when 
the numerical instabilities are more pronounced it is even 
more advantageous to adopt the optimization strategies 
recommended here. 

We generated samples of 0.96 x 10^ statistically in- 
dependent electronic configurations which were used to 
calculate the quantities involved in the various optimiza- 
tion schemes. In practical applications, a typical number 
of configurations used might be 10"*, but we found it nec- 
essary to use a much larger number to obtain sufficiently 
accurate values of the different objective functions and, 
particularly, their variances. In a practical application 
an objective function, say C{a), is evaluated using, say, 
10'* configurations. The quantities of interest are then 
C{a) and its variance calculated as averages over blocks 
of lO'* configurations. Because the numerator in C{a) 
contains Ey, which is itself a sum over configurations, 
the values of C{a) and its variance depend on the num- 
ber of configurations in the block. The variance of C(a) 
calculated as such a block average is much more sensi- 
tive to the block size than the value of C{a). As the 
number of configurations in the block increases the val- 
ues of C{a) and its variance converge to their true values. 
(Analagous arguments hold for A{a).) Quoting all our 
results as a function of the block size would result in an 
enormous increase in the amount of data. However, for 
our silicon model, the variances of the objective functions 
are close to their true asymptotic values for block sizes of 
10* configurations or greater, so the values at the limit 
of large block sizes are the relevant ones for practical 
applications, and these are the values we quote here. 

The configurations were generated by a Metropolis 
walk distributed according to <i>^, using the inexact re- 
duced basis-set wave function. An optimization proce- 
dure typically starts with non-optimal parameter values 
which are improved during the optimization procedure. 
We present results for configurations generated with the 
non-optimal value of ao = 0.03, which gives results typ- 
ical of the starting value for an optimization, and oq — 
0, which is the final value from a successful optimization 
procedure. The qualitative behaviour is not strongly in- 
fluenced by the value of ao. 

First we consider energy minimization. In Fig. m we 
plot the weighted and unweighted mean energies, Ey 
and Ec, and their variances as a function of a, with 
configurations generated from ao — 0.03 (Fig. |l|a) and 
ao — (Fig- |l]b). The unweighted mean energy has a 
maximum at ao, i.e., the value from which the configu- 



rations were generated. This result can be understood as 
follows. Consider a wave function of the form 



<i> = ^/3„Z?T25iexp 



y^^akJk 



(13) 



where the ak are parameters, and the Jk are correlation 
functions. The mean unweighted energy can be written 
as 

Ec{{ak}) = {ak -akQ)Gki{ai - am) + constant, 

(14) 

where a^o are the parameter values from which the con- 
figurations are generated and 



G, 



kl 



U<^H{ak0})J:^^^JkV^JldR 

J<i>mako})dR 



(15) 



The Gki and the constant term depend on the ako but 
not on the ak- When there is only a single parameter, 
G is negative, so that Ec is a quadratic function with a 
maximum at ak = ako- When there is more than one 
parameter the stationary point of the quadratic can be a 
maximum, minimum or saddle point, which is not accept- 
able behaviour for an objective function. The weights 
may be altered in other ways, such as limiting their up- 
per value, but if the weights are altered the minima of 
the energy are moved, which is a "strong bias" in the 
objective function. If one insists on using an energy min- 
imization method, weighting must be used. 

We now investigate the distributions of the weights and 
the local energies. In Fig. we plot the distributions of 
the weights for ao = 0.03 and a = and for ao = and 
a = 0.03, while in Fig. ^ we plot the corresponding dis- 
tributions of the local energies. The distributions of the 
weights resemble Poisson distributions, but the square 
roots of the variances are significantly greater than the 
means, so there are more configurations at large weights 
than for a Poisson distribution with the same mean. The 
local energies follow normal distributions relatively well. 
As expected, the distributions of the local energies is 
wider for the ao = 0.03 wave function. Closer inspec- 
tion reveals that the distribution of the local energies is 
not exactly normal because the actual distributions have 
"fat tails" . The outlying energies result from outlying 
values of the kinetic energy. The standard deviations 
are a — 0.964 a.u. and a — 0.726 a.u., for ao = 0.03 
and 0, respectively. The expected percentage of configu- 
rations beyond 3cr from the mean of a normal distribu- 
tion is 0.27%, but the actual percentages are 0.443% and 
0.608% for ao = 0.03 and 0, respectively. Although these 
outlying local energies give a negligible contribution to 
the mean energy, calculated with or without weighting, 
and only a very small contribution to the values of the 
variance-like objective functions, A(a), B(a), and C{a), 
they give significant contributions to the variances of the 
variance-like objective functions. 



It is highly undesirable for an objective function to 
have a large variance. A larger variance implies that a 
greater number of configurations is required to determine 
the objective function to a given accuracy. However, as 
noted above, only the variances and not the means of 
A{a), B{a), and C(a) are significantly affected by these 
outlying configurations. We therefore limit the outly- 
ing local energies. An alternative would be to delete the 
outlying configurations, but this introduces a greater bias 
and is not as convenient in correlated-sampling schemes. 
The limiting must be done by the introduction of an ar- 
bitrary criterion, which we have implemented as follows. 
First we calculate the standard deviation of the sampled 
local energies, a. We then calculate limiting values for 
the local energy as those beyond which the total expected 
number of configurations based on a normal distribution 
is less than A, where 



A = iV, X lO^P 



(16) 



Ng is the total number of configurations and p is typically 
chosen to be 8, although varying p from 4 to 12 makes no 
significant difference to the results. We include the factor 
of Ns rather than limiting the energies beyond a given 
number of standard deviations to incorporate the concept 
that as more configurations are included, the sampling is 
improved. In the limit of perfect sampling, Ns — > oo, the 
objective functions are unchanged. For our silicon sys- 
tem, the percentage of configurations having their local 
energies limited by this procedure, with p = 8, is only 
0.024% and 0.047% for ao = 0.03 and 0, respectively, 
which corresponds to those beyond 5.7 standard devia- 
tions from the mean. The effect of limiting the outlying 
local energies is illustrated in Fig. ^ In Figs. ^ and c 
we plot the mean values of the objective functions C and 
A versus a for configurations generated with ao — 0.03, 
with values of the limiting power, p, in Eq. |l^, of 4, 8, 12 
and infinity (no limiting), while in Figs. Hb and d we plot 
their variances. The mean values of C are hardly affected 
by the limiting, while those of A are only slightly altered. 
The smaller variances of C and A obtained by limiting 
the values of the local energy are very clear. In fact, if the 
local energies are not limited then the variances of the ob- 
jective function are not very accurately determined, even 
with our large samples of 0.96 x 10^ configurations. Sim- 
ilar results hold for configurations generated with ao = 
0. We are not aware of other workers limiting the local 
energies in this way. This method can significantly re- 
duce the variance of the variance-like objective functions 
without significantly affecting their mean values. Lim- 
iting the local energies is even more advantageous when 
small numbers of configurations are used. As mentioned 
above, in practical applications one evaluates the objec- 
tive functions as averages over a sample of some given 
size, so that the variance of interest is the variance for 
that block size. When the block size is small the vari- 
ances of objectives functions A and C increase, but this 
effect is greatly reduced by limiting the local energies. 



Limiting the local energies in the way we have described 
gives significantly better numerical behaviour for all the 
variance-like objective functions and therefore all data 
shown in Figs. ^-0 have been limited with p=8, unless 
explicitly stated otherwise. 

Limiting the values of the weights is a crucial part of 
the variance minimization procedure for large systems. 
Comparison of Figs. |jb and d shows that the variance of 
the unweighted objective function C{a) is smaller than 
that of the weighted objective function, A{a), for all val- 
ues of a, provided one limits the local energies. The vari- 
ances close to the minimum are similar but away from the 
minimum the variance of A increases much more rapidly 
than that of C. The smaller variance of C indicates the 
superior numerical stability of the unweighted function. 
Qualitatively similarly behaviour occurs for configura- 
tions generated with ao = 0. A commonly used alter- 
native to setting the weights equal to unity is to limit 
the maximum value of the weights. In Fig. |5| we show 
data for objective function C with the largest value of the 
weights limited to multiples of 1 and 10 times the mean 
weight, along with data for the weights set to unity. In 
this graph the standard deviations of the objective func- 
tions are plotted as error bars. Fig. shows that the 
variance of C is reduced as the weights are more strongly 
limited, but the lowest variance is obtained by setting 
the weights to unity. In addition, when the weights are 
limited the curvature of the objective function is reduced, 
which makes it more difficult to locate the minimum. We 
therefore conclude that setting the weights to unity gives 
the best numerical stability. 

Finally, we study the effect of using the objective func- 
tion B{a) (Eq. 0), in which the variational energy, Ey, is 
replaced by a fixed reference energy, E, which is chosen 
to be lower than the exact energy. In Fig. 13 we show the 
objective function B{a) versus a for configurations gen- 
erated with ao = 0. The overall shapes of the curves are 
hardly changed as E is decreased, although the variance 
of the objective function slowly increases. If E is chosen 
to be too low then a significant amount of energy mini- 
mization is included and the numerical stability deterio- 
rates. The objective function B does have the property 
that its variance is independent of the block size, so that 
it does not show the increase in variance at short block 
sizes, but in practice we have not found this to be an 
important advantage. Using a value of E slightly below 
Ey appears to offer no significant advantages. 

A direct comparison of the different variance-like ob- 
jective functions is made in Fig. Q The behaviour of the 
following objective functions are displayed: (i) A, (ii) B 
with E = Ey — 0.3750 a.u., (iii) C, and (iv) a variant of 
C with the maximum value of the weights limited to 10 
times the mean weight. Limiting outlying values of the 
local energy improves the behaviour of all the objective 
functions, so in each case we have limited them according 
to Eq. |l^ with p=8. The mean values of the objective 
functions are plotted in Fig. |^a, which shows them to 
behave similarly, with the positions of the minima being 



almost indistinguishable. However, the curve for the vari- 
ant of C with limited weights is somewhat flatter, which 
is an undesirable feature. The standard deviations of 
the objective functions are plotted in Fig. |7|b, and here 
the differences are more pronounced. The unweighted 
variance, C, has the smallest variance, which is slightly 
smaller than that of the variant of C with strongly limited 
weights. The variances of the objective functions which 
include the full weights increase rapidly away from a=0. 
This rapid increase is highly undesirable and can lead to 
numerical instabilities. 



V. CONCLUSIONS 

We have analysed energy and variance minimiza- 
tion schemes for optimizing many-body wave functions, 
where the integrals involved are evaluated statistically. 
We have suggested two reasons why variance minimiza- 
tion techniques are numerically more stable than energy 
minimization techniques: 

1. In variance minimization it is allowable to limit the 
weights or set them equal to unity, which reduces 
the variance of the objective function while intro- 
ducing only a "weak bias" , which disappears as the 
process converges. Altering the weights in energy 
minimization normally leads to a badly behaved 
objective function. 

2. Variance minimization, with or without altering the 
weights, shows greater numerical stability against 
errors introduced by finite sampling because the po- 
sitions of the minima of the variance are not shifted 
by the finite sampling, whereas those of the (prop- 
erly weighted) energy are. 

We have studied optimization strategies for a realistic 
model of the valence electronic structure of diamond- 
structure silicon. The best strategy we have found is: 

1. Minimize the variance of the unweighted local en- 
ergy (objective function C, Eq. 0). 

2. Limit outlying values of the local energy according 
to Eq. m. 

3. Regenerate the configurations several times with 
the updated parameter values until convergence is 
obtained. 

This stategy may be applied to both ground and ex- 
cited states of atoms, molecules and solids. It has been 
designed to be optimal for systems containing many elec- 
trons. The behaviour that we have observed in numerous 
wave function optimizations for large systems is consis- 
tent with the analysis presented in this paper and in- 
dicates that the above optimization strategy is robust, 
accurate and efficient. 
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FIG. 1. Weighted and unweighted mean energies and standard deviations, shown as error bars, for configurations generated 
with (a) Qo=0.03 and (b) ao=0. 
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FIG. 2. Distributions of weights for configurations generated with ao=0.03 and ao=0, evaluated with a=0 and q=0.03, 
respectively. 
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FIG. 3. Distributions of the local energies for a—0 and a=0.03. 
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FIG. 4. The effect of limiting outlying energies on (a,b) objective function C (the unweighted variance) and (c,d) objective 
function A (the weighted variance) with ao=0.03. Outlying energies are limited as in Eq. hq with the values of p shown. 
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FIG. 5. The unweighted objective function C generated with ao = and with hmiting of the weights. 
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FIG. 6. Objective function B versus a with qq = and E < Ey. 
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FIG. 7. Comparison of variance-like objective functions with ao = as a function of a. 
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